The following analysis is parallel to the analysis performed in the manuscript, however it is preceded by the non-linear transformation steps outlined below.
We began by evaluating the use of cubic splines with a penalty score
encouraging monotonicity using a general additive model in the
mgcv package. We iteratively increased the number of
degrees from 1-5, and evaluated the fits using AIC to select the
strongest model for each landscape, then logged the number of degrees
that should be used for each transform.
| Unique_ID | Best_Degree | Best_AIC |
|---|---|---|
| AP_catef_1 | 4 | 79.375299 |
| DHFR_ic50_c57 | 1 | 22.838106 |
| DHFR_ic50_c58 | 1 | 29.992922 |
| DHFR_ic50_c59 | 4 | 18.441344 |
| DHFR_ic50_c60 | 5 | 24.490984 |
| DHFR_ic50_c61 | 5 | 9.666351 |
| DHFR_ic75_palmer | 5 | 197.681499 |
| DHFR_kcat_trajg | 4 | 6.681135 |
| DHFR_kcat_trajr | 5 | -3.673020 |
| DHFR_ki_trajg | 5 | 18.610525 |
| DHFR_ki_trajr | 4 | 6.720748 |
| MPH_catact_CaPTM | 1 | 35.963443 |
| MPH_catact_CdPTM | 1 | -11.111711 |
| MPH_catact_CoPTM | 1 | 24.892223 |
| MPH_catact_CuPTM | 1 | 23.676399 |
| MPH_catact_MgPTM | 1 | 26.680008 |
| MPH_catact_MnPTM | 1 | 11.392128 |
| MPH_catact_NiPTM | 1 | 4.306652 |
| MPH_catact_ZnPTM | 4 | 46.094248 |
| NfsA_ec50_2039 | 5 | -101.318022 |
| NfsA_ec50_3637 | 5 | -65.972370 |
| OXA-48_ic50_CAZtraj1 | 5 | -32.759419 |
| OXA-48_ic50_CAZtraj2 | 4 | -26.712493 |
| OXA-48_ic50_CAZtraj3 | 1 | -99.376569 |
| OXA-48_ic50_PIPtraj1 | 5 | 6.216836 |
| OXA-48_ic50_PIPtraj2 | 5 | 28.085051 |
| OXA-48_ic50_PIPtraj3 | 4 | -20.227528 |
| PTE_catact_2NH | 5 | 54.866422 |
| PTE_catact_butyrate | 5 | -15.140782 |
| TEM_MIC_weinreich | 5 | 39.737284 |
| TEM_growth_AM | 4 | 13.245353 |
| TEM_growth_AMC | 4 | 7.064258 |
| TEM_growth_AMP | 5 | 42.425104 |
| TEM_growth_CAZ | 1 | 40.127210 |
| TEM_growth_CEC | 1 | 39.052728 |
| TEM_growth_CPD | 5 | 32.778424 |
| TEM_growth_CPR | 1 | 30.292993 |
| TEM_growth_CRO | 5 | 40.952662 |
| TEM_growth_CTT | 1 | 45.810669 |
| TEM_growth_CTX | 5 | 36.284359 |
| TEM_growth_CXM | 4 | 23.141110 |
| TEM_growth_FEP | 1 | 26.596062 |
| TEM_growth_SAM | 5 | 22.150169 |
| TEM_growth_TZP | 5 | 21.309300 |
| TEM_growth_ZOX | 5 | 23.862908 |
We then visually evaluated the cubic spline transformations (in blue) for each landscape with the x-axix representing additive effects, and y axis representing observed effects.
From the fits we notice a certain degree of overfitting. One approach to alleviate this would be to arbitrarily lower the number of degrees, however instead we opted to use a different transformation method which is less prone to overfitting.
The drawback of using monotonic splies (or even power transforms), as can be seen from the plots above, is the lack of bounding. Bounding is crucial to avoid greatly transforming phenotype values that lay outside of the bounds of the transform, which itself is constrained by the range of the predicted first-order effects.
In other words, phenotypes with magnitudes that lay outside of the range of predicted values by the first-order model are at risk of being incorrectly transformed. Though this is somewhat true for a four-parameter model, the bounded upper- and lower- thresholds ensure that phenotypes that lay outside of the predicted range will be approximated more accurately, i.e., the transformation will be lower in magnitude than other models.
We attempted to transform all datasets using the four-parameter
function, with fitting performed by non-linear least squares regression
using nlsLM. The fits are compared to a simple linear model
fit using an AIC to determine whether the additional information
stemming from the four-parameter transform is parsimonious, otherwise,
no transform is applied.
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
## Error in nlsModel(formula, mf, start, wts) :
## singular gradient matrix at initial parameter estimates
Indeed, these fits appear much better visually than the spline fits, as they avoid overfitting to datasets. We used these transforms in the subsequent analyses. Note: we removed landscapes TEM_growth_AMP, TEM_growth_AMC, TEM_growth_AMP, and TEM_growth_CAZ, as the four-parameter model was more parsimonous than the linear model, however the fits reduced all values in the landscape to binary values that represented the upper or lower bounds of the four-parameter model.
The spread of functional contributions of the positions. This shows whether introducing a mutation in a given position impacts the function positively or negatively (or neutral) across all backgrounds
We have 198 positions.
First order positions which have a heterogeneity index > log10 1.5-fold, 2-fold, 5-fold, and 10-fold for adaptive trajectories
| mutations | idiosync_sig_1.5 | idiosync_sig_2 | idiosync_sig_5 | idiosync_sig_10 | idiosync_total | idiosync_1.5 | idiosync_2 | idiosync_5 | idiosync_10 |
|---|---|---|---|---|---|---|---|---|---|
| Order 1 | 189 | 184 | 141 | 110 | 198 | 95.45455 | 92.92929 | 71.21212 | 55.55556 |
We assign positive, neutral, negative, negative-neutral, positive-neutral, and negative-positive ‘types’ based on log10 1.5-fold threshold to every order
Note, the aesthetics of the pie charts were edited elsewhere for publication.
Raw Values
| mutations | Negative | Neutral | Neutral Negative | Neutral Positive | Positive | Positive Negative | Total |
|---|---|---|---|---|---|---|---|
| Order 1 | 3 | 5 | 34 | 44 | 20 | 92 | 198 |
| Order 2 | 3 | 11 | 44 | 68 | 10 | 259 | 395 |
| Order 3 | 12 | 13 | 51 | 58 | 37 | 251 | 422 |
| Order 4 | 14 | 3 | 47 | 38 | 8 | 135 | 245 |
| Order 5 | 8 | 3 | 12 | 8 | 9 | 44 | 84 |
| Order 6 | NA | NA | 3 | 5 | 1 | 5 | NA |
Percentages
| mutations | Negative | Neutral | Neutral Negative | Neutral Positive | Positive | Positive Negative | Total |
|---|---|---|---|---|---|---|---|
| Order 1 | 1.5151515 | 2.525253 | 17.17172 | 22.22222 | 10.101010 | 46.46465 | 198 |
| Order 2 | 0.7594937 | 2.784810 | 11.13924 | 17.21519 | 2.531646 | 65.56962 | 395 |
| Order 3 | 2.8436019 | 3.080569 | 12.08531 | 13.74408 | 8.767772 | 59.47867 | 422 |
| Order 4 | 5.7142857 | 1.224490 | 19.18367 | 15.51020 | 3.265306 | 55.10204 | 245 |
| Order 5 | 9.5238095 | 3.571429 | 14.28571 | 9.52381 | 10.714286 | 52.38095 | 84 |
| Order 6 | NA | NA | NA | NA | NA | NA | NA |
The purpose of this analysis was to determine the difference in SMEs between the WT-background contribution of a given position/combination and the mean contribution of a given position or combination
In the manuscript, we also briefly mentioned how many WT SMEs have sign deviation (not just magnitude) from the mean at the 1st order. This is in fact 1.01% (2/198).
Also, in the supplementary of our paper, we show this table to show percentages of WT-bg points that deviate from the positional/combinatorial mean by multiple significance threshold (log10 1.5-, 2-, 5-, and 10-fold)
| Order | percent_1.5 | outlier_1.5 | percent_2 | outlier_2 | percent_5 | outlier_5 | percent_10 | outlier_10 | total |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 66.7 | 132 | 52.5 | 104 | 24.2 | 48 | 11.6 | 23 | 198 |
| 2 | 59.5 | 235 | 47.8 | 189 | 20.5 | 81 | 9.6 | 38 | 395 |
| 3 | 51.7 | 218 | 34.4 | 145 | 19.0 | 80 | 12.6 | 53 | 422 |
| 4 | 60.4 | 148 | 40.0 | 98 | 16.3 | 40 | 11.0 | 27 | 245 |
As well as the devation of every single SME and EE from the mean, instead of just the wt background.
| Order | percent_1.5 | outlier_1.5 | percent_2 | outlier_2 | percent_5 | outlier_5 | percent_10 | outlier_10 | total |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.5800305 | 2283 | 0.4153963 | 1635 | 0.1572663 | 619 | 0.0884146 | 348 | 3936 |
| 2 | 0.5148601 | 2356 | 0.3529283 | 1615 | 0.1440122 | 659 | 0.0804196 | 368 | 4576 |
| 3 | 0.5091712 | 1499 | 0.3383152 | 996 | 0.1348505 | 397 | 0.0889946 | 262 | 2944 |
| 4 | 0.6017857 | 674 | 0.4062500 | 455 | 0.1187500 | 133 | 0.0758929 | 85 | 1120 |
Histogram to show heterogenity at Orders 2, 3, and 4 with the log10 1.5-fold significance threshold
What percentage of positions remain significantly heterogeneous (1.5-, 2-, 5-, and 10-fold) at all orders?
| mutations | idiosync_sig_1.5 | idiosync_sig_2 | idiosync_sig_5 | idiosync_sig_10 | idiosync_total | idiosync_1.5 | idiosync_2 | idiosync_5 | idiosync_10 |
|---|---|---|---|---|---|---|---|---|---|
| Order 1 | 189 | 184 | 141 | 110 | 198 | 95.45455 | 92.92929 | 71.21212 | 55.55556 |
| Order 2 | 376 | 351 | 229 | 172 | 395 | 95.18987 | 88.86076 | 57.97468 | 43.54430 |
| Order 3 | 397 | 359 | 206 | 147 | 422 | 94.07583 | 85.07109 | 48.81517 | 34.83412 |
| Order 4 | 244 | 225 | 123 | 79 | 245 | 99.59184 | 91.83673 | 50.20408 | 32.24490 |
Like the SMEs, we checked whether there is a sign deviation between the wt EE and mean EE at each order of interaction. For pairwise we saw 37/395 (9.37%). For three-way we saw 26/422 (6.16%). For four-way we saw 17/245 (6.94%).
Import the absolute error files for each data set
How well does each model predict at each order
| Step | wtbg_mean | wtbg_median | lin_mean | lin_median |
|---|---|---|---|---|
| 1 | 33.01911 | 9.246631 | 2.822502 | 2.175932 |
| 2 | 95.83797 | 48.281320 | 1.969366 | 1.706986 |
| 3 | 53.33004 | 6.599577 | 1.314896 | 1.183863 |
| 4 | 39.87798 | 6.595830 | 1.088091 | 1.048122 |
| 5 | 26.80878 | 4.350230 | 1.055684 | 1.024498 |
| 6 | 6.66224 | 6.662240 | 1.014927 | 1.014927 |
Are the means significantly less than 1.5-fold?
| Step | wtbg_pval | lin_mean |
|---|---|---|
| 1 | 0.9999993 | 0.9999859 |
| 2 | 1.0000000 | 0.9923349 |
| 3 | 0.9999835 | 0.0051152 |
| 4 | 0.9970087 | 0.0000000 |
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
This plot omits some data from being viewed, however plotted means and medians are accurate. The figure is edited outside of the code for aesthetics.
Instead of predicting end point variants, we can look at how the biochemical model predicts variants along the most accessible trajectory by monitoring AE for the predicted function of each intermediate. The x-axis represent the order of the intermediate mutant that is being predicted along the most accessible trajectory.
Interestingly, for the trajectories that were transformed by the non-linear transform (both DHFR_ki trajectories, NfsA trajectories, OXA trajectories, PTE, and TEM) the previous finding of the lower model often being better at prediction is alleviated. Hence, it appears it was mostly non-specific epistasis that was causing the large error in prediction of incorporating apparent idiosyncratic epistasis into the model.
First we load the required data for network rewiring exploration, as well as establish filters for adaptive trajectories and ‘mechanistic’ trajectories.
Then we look at how many network rewiring events there are.
| epistasis at lower-order? | epistasis at higher-order? | n | percentage |
|---|---|---|---|
| FALSE | FALSE | 242 | 23.56378 |
| FALSE | TRUE | 251 | 24.44012 |
| TRUE | FALSE | 147 | 14.31353 |
| TRUE | TRUE | 387 | 37.68257 |
Of the rewired, how many are constructive and how many are disruptive?
| event_type | n | percentage |
|---|---|---|
| constructive | 93 | 24.03101 |
| disruptive | 294 | 75.96899 |
This analysis was bespoke, but surveying all network rewiring events manually and selecting genotypes along adaptive trajectories.